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Hurst Exponents For Short Time Series 
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A new concept, called balanced estimator of diffusion entropy, is proposed to detect scalings in 
short time series. The effectiveness of the method is verified by means of a large number of artificial 
fractional Brownian motions. It is used also to detect scaling properties and structural breaks in 
stock price series of Shanghai Stock market. 
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I. INTRODUCTION 

Scale invariance has been found to empirically hold for 
a number of complex systems [l|. Consider a stochas- 
tic trajectory X(t), whose statistical properties are de- 
scribed by the probability distribution function (PDF) of 
the displacements, p(x,t). The stochastic process repre- 
sented by X(t) behaves scale- invariant, provides the PDF 
satisfies, 



(?) 



(1) 



where 8 is the scaling exponent. Ordinary statistical me- 
chanics is intimately related to the Central Limit Theo- 
rem , which implies the Gaussian form of the function 
F(-) and 8 — 0.5 3\. By using the the scaling exponents 
one can describe quantitatively the deviations from ordi- 
nary mechanics, and consequently assess the real physical 
nature of a phenomenon. But evaluation of the scaling 
exponents meets several challenges. 

How to evaluate a reliable scaling is not a trivial task. 
Variance-based methods are used widely in literature to 
calculate the scaling exponents [J] , in which there ex- 
ists an assumption of the time dependence of the vari- 
ance Var(X(t)) with the scaling exponent 8, namely, 
Var(X(t)) <~ t 2S . For fractional Brownian motions it 
is valid, but there are scaling processes such as Levy 
flights for which the second moment diverges or Levy 
walks for which the second moment satisfies a scaling 
relation Var(X(t)) ~ t 2H with H ± 8, i.e, the re- 
lationships are violated @. Several efforts have been 
done to develop complementary methods to evaluate re- 
liable scaling exponents |6|, |7(. To cite an example, 
from the PDF one can calculate the Shannon entropy, 
S(t) — — J p(x,t) lnp(x, t)dx, which is originally iden- 
tified as diffusion entropy by Scafetta et al. [6]. It is 
proved that the diffusion entropy can provide simultane- 
ously reliable values of 8 for fractional Brownian motions 
and Levy processes. 



For a real-world stochastic process, the PDF is gener- 
ally not known. One can count how often the value x 
appears in the data set of trajectory X(t). Denoting the 
number with n(x), the PDF can be estimated with the 
relative frequency, —ji^-. N is the total size of the data 
set. In many situations only small data sets from which 
to infer PDF are available. What is more, for a stochastic 
process with a large amount of data sets, there exist gen- 
erally structural breaks in the trajectory due to emergent 
strikes from environments and/or the system's transition 
to a contrasting dynamical regime. To cite examples, a 
stock market is shocked frequently by currency and tax 
policies, before and after an earthquake the earth mo- 
tion may stay in different dynamical regimes. We should 
separate the data sets into many sub-sets to detect the 
behaviors at different structural patterns. A small value 
of N may induce large statistical fluctuations or even bias 
to physical quantities as the PDF, entropy, moments and 
so on. In a recent novel paper, Bonachela et al. recall 
the search for improved estimators of entropy for small 
data sets [8|. They propose also a new "balanced esti- 
mator" that out-performs other currently available ones 
when the data sets are small and p(x, t) are not close to 
zero. 
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Stimulated by the two mentioned efforts, in the present 
paper a new concept is introduced, called Balanced Es- 
timator of Diffusion Entropy (BEDE), in which the bal- 
anced estimator of entropy is used to replace the original 
form in the diffusion entropy. This concept is used to 
find scalings and structural breaks in artificial and em- 
pirical series. Firstly, we review briefly the concepts of 
diffusion entropy and balanced estimator, and introduce 
consequently the concept of BEDE. Secondly, the effec- 
tiveness of the BEDE in detecting scalings in short time 
series are verified by means of a large amount of frac- 
tional Brownian motions. Finally, we detect the scalings 
and structural breaks in the stock price series of Shanghai 
Stock Market. 
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II. METHODS AND MATERIALS 

To keep the description self-contained, we review 
briefly the concepts of diffusion entropy and balanced 
estimator. 



A. Diffusion Entropy 

Let us consider a one-dimensional stationary time se- 
ries, 

£i,&,---,6v. (2) 

N is the length of the series. All the possible segments 
with length s read, 

X i = U i> C i+ i > -.- ) e i+a _i},t = l,2,..-,iV-a + l. (3) 

Now we regard the length s as time, the vector Xi can be 
regarded as a stochastic trajectory of a particle starting 
from its initial position Xj(0) = 0. By this way, the time 
series (2) is mapped to an ensemble containing N — s + 1 
realizations of a stochastic process. The displacements 
read, 

s i+s — 1 

x i (s) = Y i X i (j)= i...' 1.2..-. .A a • !. (4) 

3=1 J=i 

One can divide the displacement interval where the 
particle appears into M(s) bins, and reckon the num- 
ber of the particle's occurrences in each bin at time s. 
We denote the numbers with Nj(s),j = 1 , 2, ■ ■ • , M(s). 
The PDF can be naively approximated by the relative 
frequency, 

p(j, a) ~ p(j, a) = N NA f +1 ,J =1,2,-, M{s). (5) 

The entropy of the diffusion process is consequently de- 
termined, which reads, 

M(s) 

Sde(s) ~ S n D T e {s) = -£ P(j, s)ln\p(j, a)}. (6) 

3=1 

This entropy is based upon the diffusion process con- 
structed from the original series (2), for this reason is 
called Diffusion Entropy (DE) [9]. 

The key step in calculation of the DE is how to choose 
the size of the bins, e(s). The easiest way is to assume it 
to be a fraction of the square root of the variance of the 
original series (2) and independent of s. 

Now we assume the time series behaves scale- 
invariance, namely, p(j,s) obeys the relation (f), 

j = 1,2,- ■■ ,M(a), 



where x m i n (s) is the smallest value of displacement, i.e., 
x min {s) = min[x 1 {s),x 2 (s), ■ ■ ■ ,x N (s)]. Let us plug 
Eq.(7) into Eq.(6). A simple computation leads to, 

S DE (s)=A + 5ln(s), (8) 

where A = /+~ dyF(y)ln[F(y)}. 

The simple relation of Eq.(8) can be used to detect 
scalings in time series. It is the first tool yielding the 
correct scaling in both the Gauss and the Levy statistics. 
For this reason, it attracts special attentions from diverse 
research fields Id ■ 



B. Balanced Estimator For Diffusion Entropy 

From the relation Eq.(5) we have the ensemble average, 
(p(3,s)) = ^^ I =p(j,s). (9) 

In other words, the frequencies p(j, s) approximate the 
probabilities with certain statistical error (variance) but 
without any systematic error (bias). The frequencies 
p(j, s) are unbiased estimators of the probabilities p(j, s). 

However, there is an important difference between 
Sde(s) and S"^ ve {s) in Eq.(6) stemming from the non- 
linear nature of the entropy functional. Defining an er- 
ror variable, fi(j, s) — ^^j~f^-, and replacing p(j, s) in 
Sde by its value in terms of s) and p(j, s), a straight- 
forward algebraic leads [fl[ , 

S DE (s) = S n D T e + 2( ^ ( _! ) 7 + 1 1) + 0{M{s)). (10) 

The leading order of error, 2 {n-s+\) ' ^ s a significant error 
for small N — s + 1 and vanishes only as (N — s) — > oo. 
Consequently, S r jf E ve {s) is a biased estimator of Sde(s), 
i.e., it deviates from the true entropy not only statisti- 
cally but also systematically. 

An improved estimator of Sde(s) should reduce the 
bias or the variance as possible, which can be formulated 
as follows. Defining Sde[p{], s)] = —p(j,s)ln[p(j,s)], we 

M(s) 

have Sde(s) — Sde\pU, s )]- We want to find an 

3=1 

estimator, 

M(s) 

S DE (s)=J2S DE [n(j,s)], (11) 

3=1 

so that the bias, 

KiaAs) = ((Sde(s)) - S DE (s)y (12) 
or the mean squared deviation 

AL(s) = ((Sde(s) - (S DE ( S ))) 2 ) (13) 
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or a combination of both are as small as possible. 

Ignoring correlations between the elements of the dis- 
tribution, n(j,s),j = 1,2, ••■ ,M(s), the problem can 
be reduced to minimize simultaneously the bias and the 
variance for each summand, 

ALJP(^)] = ((s DE [nU,s)]) -S DE [p(j, S )])\ (14) 
and 

A statbO»] = (jj>DE[n(j,s)] - (s D E[n(j 7 s)} 

(15) 

where the probability p(j,s) G [0,1], and n(j,s) G 
{0, 1, 2, • • • , N — s + 1} is binomially distributed. To bal- 
ance the errors, we minimize the average error over the 
whole range of p(j, s) G [0, 1], 

A 2 (j, s )= f dp{j,s)-w\p(j,s)}-[Al as (j,s) + A 2 stat (j,s)] 
Jo 

where w[p(j, s)] is a suitable weight function that spe- 
cific problem depends. Without extra knowledge of 
the probability values, one can consider a simple case 
of w\p(j,s)] = 1. Inserting Eq.(14) and Eq. (15) into 
Eq. (16), the average error is given by, 

A 2 (j,s) = 

1 ( N-s+l 

Jo Mj, s) < E P nU.s) [p(j, s)} S 2 DE [n(j, s)] 
+S 2 DE \p(j, s)} 

(N-s+l 
E p nU,s) \p(j, s)} S DE [n(j, s)} 
7i(j»=0 



where P n (j,s) \p(ji s )} 1S the binomial distribution, 

Pn(j,s) [P(j, s)} = n (j^)\[N-s+l-n(j,s)]< X 



[i - « 



liV— s+l— n(j,s) 



(18) 



The goal is to determine all the numbers 
SDE[n(j, s)],j = 1, 2, • • • , N — s + 1 which leads 
to the minima of the average error. The necessary 
condition is that all the partial derivatives vanish, i.e., 



dA 2 (j,s) 



0,n(j,s) = 1,2,-- • ,N-s + l. (19) 



dS DE [n(j,s)} 
Detailed computation leads to, 
SDE[n(j,s)} 

= (N - s + 2) J,, 1 dp(j, s)P n{jtS) \p(j, s)} ■ S DE [ P (j, s)} 

_ nu,a) + l 1 
N-s+3 k ' 

fc=n0»+2 

( 20 ) 

The final improved estimator of diffusion entropy 
reads, 



, M(s) W- s +3 , 

i=l fc=JV 3 (s)+2 



1) 



called Balanced Estimator of Diffusion Entropy (BEDE) 
in this paper. 



C. Materials 

1. Fractional Brownian Motions 

Fractional Brownian motions (fBm) [ijjare used to 
verify the effectiveness of BEDE in detecting scalings in 
short time series. A fBm is a continuous-time Gaussian 
process depending on the Hurst parameter < H < 1. 
The fBm is self-similar in distribution and the variance 
of the increments is given by V ar(f Bm(t) — fBm(s)) 



\2H 



The program wfbm.m in Matlab® is used to 



\t 

generate the fBm series. 



2. Shanghai Stock Exchange Indices 

The empirical data are the time series of stock price in- 
dices from the Shanghai Stock Exchange (SSE) [13], the 
world's 5th largest stock market by market capitalization 
at US2.7 trillion as of Dec 2010. The current exchange 
was established on November 26, 1990 and was in opera- 
tion on December 19 of the same year. We collect totally 
134 closed stock price series starting from the end of the 
year 1995 to the end of June, 2010, in which the num- 
bers of the stocks distribute in the categories of industry, 
business, real estate, public utility, and comprehension 
are 64, 27, 12, 12 and 19, respectively. We consider also 
the stock price indices of the five categories from Decem- 
ber 6,1994 to June 30, 2010. The SSE index series starts 
from December 19,1990 and ends at June 30, 2010. 

For a closed stock price series, p(t), one can construct 
the corresponding return series, 



r(t) 



log a [p(t + At)} 
log a [p(t)} 



(22) 



In the calculations, At is selected to be 5, i.e., weekly 
return ratio is considered. 



III. RESULTS 

Fig.l presents several typical examples of comparison 
between BEDE and DE. For the three generated fBm 
series, (a)-(c), with H = 0.7 and lengths 650,650,4000, 
respectively, in a wide range of s, the BEDEs obey almost 
perfect linear relations versus ln(s) as shown in (d)-(f), 
i.e., the scalings are all perfectly rendered out. The es- 
timated values of the scalings ( the slope of the BEDE 
curve) are 0.70, 0.67 and 0.71, which can be regarded as 
the same with the expected value of H = 0.7. However, 
for the short series in (a) and (b), with the increase of 
time, the DE curves tend to bent down and the devi- 
ations from the linear relations of BEDE versus ln(s) 
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FIG. 1: (Color online) Typical examples of comparison 
between BEDE and DE. (a)-(c) Generated ffim motions 
with H = 0.7 and lengths 650,650,4000, respectively. 
(d)-(f) In a wide range of s, the BEDEs obey almost 
perfect linear relations versus ln(s). The estimated val- 
ues of the scalings ( the slope of the BEDE curve) are 
0.70,0.67 and 0.71. With the increase of time, the DE 
curves tend to bent down and the deviations from the 
linear relations of BEDE versus ln(s) become more and 
more significant. With the increase of length, e.g., 4000, 
the DE curve is corrected significantly to be much more 
closer to the BEDE curve in a wider range of ln(s), as 
shown in (f). 



FIG. 3: (Color online) BEDE-based scaling estimations 
for the SSE index, and the indices of the five catalogues 
including industry, business, real estate, public utility, 
and comprehension, (a) In a wide range of s, the rela- 
tions of BEDE versus ln(s) obey a linear-law. The scaling 
exponents are S sse = 0.63, 8 bi z = 0.62, 5 ind = 0.65, 5 rea = 
0.64, Sp U b = 0.63, and S com — 0.60, respectively . (b) 
The scaling estimations for the selected 134 stocks. The 
scaling exponent for each catalogue is significantly larger 
compared with that of the stocks included in the corre- 
sponding catalogue. 
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FIG. 2: (Color online) Confidence of BEDE-based scal- 
ing estimation, (a) 10 4 series with TV = 650 and H = 
0.5,0.6 and 0.7 are generated, respectively. The scal- 
ing estimations distribute normally and center at the ex- 
pected values of S = 0.5, 0.6 and 0.7. (b) The relation 
of certainty level versus series length at a specified confi- 
dence interval AH — 0.08. 



become more and more significant. With the increase of 
length, e.g., 4000 in Fig. 1(c), the DE curve is corrected 
significantly to be much more closer to the BEDE curve 
in a wider range of ln(s), but at the region with larger 
values of s it still bents down with an unacceptable bias. 
Hence, for a series with short length as N ~ 650, the 
calculated values of DE have unacceptable errors due to 
bias. The curve of DE versus ln{s) can not detect cor- 
rectly the scaling at all, while the BEDE can give perfect 
estimations of entropy even for considerable large of s, 
namely, a small set of data (N — s + 1 records) . We must 
correct the bias in entropy by means of the BEDE in an- 
alyzing the stock price series of SSE, which has only a 
short history of about 15 years (~ 10 3 in length). 

For a specific value of H, one can generate a large 
amount of fBm series. It is found that the scaling esti- 
mations distribute normally, as shown in Fig. 2(a) a typ- 
ical example, in which totaly 10 4 series with N = 650 
and H = 0.5, 0.6 and 0.7 are used. By specifying a confi- 
dence interval [H — AiJ, H + AH] the corresponding level 
of certainty p con f is determined so that p con f ■ N con f es- 
timations occur in the confidence interval. N con f is the 
total number of the generated fBm series. Fig. 2(b) shows 
the relation of the certainty level p CO nf versus the se- 
ries length N. At the beginning, with the increase of 
-^j Pconf increases rapidly, while when N becomes large 
Pconf tends to saturate to a high value. Accordingly, we 
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FIG. 4: (Color online) Evolution of local scaling es- 
timation. Segment length is chosen to be 650. The 
SSE index series and the 8 evolution curve are matched 
with At = 60. The value of 5 distributes in a wide in- 
terval from 0.53 to 0.85. The scaling for the total se- 
ries is 0.63. There exist globally four peaks covering 
355, 676, 1704 and 1396 data points, namely, persisting 
roughly 18,34,85 and 70 months, respectively. The sig- 
nificant transitions marked with ST#1, ST#2 and ST#3 
correspond to the bull market in the duration from July 
29, 1994 to Sept. 13,1994, the bull market in the dura- 
tion from Jan.19,1996 to May 12,1997, and the RSSLC- 
induced increase of SSE index for about 24 months (star- 
ing from May 9,2005), respectively. 



select N = 650 in the following calculations in detecting 
local scaling behaviors of the stock series in SSE stock 
market. 

We calculate the BEDEs for the Shanghai Stock Ex- 
change index (SSE), and the indices of the five catalogues 
including industry, business, real estate, public utility, 
and comprehension. In a wide range of s, the relations 
of BEDE versus ln(s) obey a linear-law. Hence, there 
exist almost perfect self-similarities and the scaling ex- 
ponents are 5 sse = 0.63, ^ = 0.62, 8 in d = 0.65, S rea = 
0.64, Sp U b = 0.63, and 5 com = 0.60, respectively (see 
Fig. 3(a)). The scaling estimations for the selected 134 
stocks are also calculated, as shown in Fig. 3(b). The 
scaling exponent for each catalogue is significantly larger 
compared with that of the stocks included in the cor- 
responding catalogue. Though each specific stock is al- 
most not predictable, the catalogue it belongs to, i.e., a 
combination of the stocks in the catalogue is much more 
predictable. 

For the SSE series, {r SSE (l), r S SE(2), ■ ■ ■ ,r SSE {N)}, 
one can calculate the scaling exponents for all the seg- 
ments of {rssE{ts+l), r S sE(t-s+2), ■ ■ ■ ,r S SE(t)},t = 
s, s + 1, ■ ■ ■ ,N, denoted with 5ssE(t — At), which are em- 
ployed in the present work to represent the local scalings 
of the SSE series, as shown in Fig. 4. The value of s is 



chosen to be 650. Assuming a structural break occurs at 
time t, only when the segment covers a certain number of 
data after the time t, the contribution from the break's 
occurrence becomes significant and detectable. We in- 
troduce the parameter At to describe this kind of delay 
effect. 

It is found that in the more than ten years duration 
the value of 5 distributes in a wide interval from 0.53 
to 0.85. The scaling for the total series is 0.63, a value 
comparatively closer to the lower bound 0.53. What is 
more, though there exist rich fine-structures with locally 
abrupt changes, there are globally four peaks covering 
355, 676, 1704 and 1396 data points, namely, persisting 
roughly 18,34,84 and 69 months, respectively. 

It is reasonable to believe that important events, 
such as policies and/or emergencies, may lead speedy 
transitions of a stock market from lower (higher) to 
higher(lower) predictable. From the evolutionary curve 
of S, one can find three sharp transitions marked with 
red arrows and denoted with ST '#1 ST '#2 and ST#3, 
respectively. The distances between the successive tran- 
sitions are about 17.5 and 112.5 months. By comparing 
with the important events occurring in the history of the 
SSE market [l3 |. the value of At is determined to be 60. 
By this way the stock series and the S evolutionary series 
are matched along time, as shown in Fig. 4. 

The first transition, ST#1, corresponds to the bull 
market in the duration from July 29, 1994 to Sept. 
13,1994. Before this bull market, the market has suf- 
fered from a 17-month-duration of decrease. The China 
Securities Regulatory Commission (CSRC) issues three 
special policy regulation items to bailout the stock mar- 
ket. Accordingly, the SSE index increases rapidly from 
325 to 1052 within one and half month (reaches the record 
at Sept. 13,1994). 

The second transition, ST#2, matches with the bull 
market in the duration from Jan.19,1996 to May 12,1997, 
in which the stock index rises up to 1464 from 512. At the 
time speculating blue chip stocks tends to dominate the 
investment concept. The Shenzhen Development Bank 
and Sichuan Changhong become successively the leading 
stocks in the Shenzhen Stock Exchange market and The 
Shanghai Stock Exchange market, respectively. Stimu- 
lated by the two stocks, the SSE market becomes high 
active and after the National day of China, the prices for 
almost all stocks increase rapidly. The CSRC issues suc- 
cessively some policy regulation items to cool down the 
stock market and expounds in-detail the irrational state 
of the stock market. 

Starting from June 13,2001, the day a local maxima 
occurs at level 2242, a decreasing process persists about 
48 months, during which a bouncing maxima occurs at 
level 1778 at April 6, 2004. At May 9,2005 the reform of 
the shareholder structure of listed companies (RSSLC) 
is conducted, which induces a persistent increase of SSE 
index for about 24 months. This event accords with the 
third transition, ST '#3. Then the persistent increase is 
disturbed to a chaotic state by the escalation of stamp 
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tax at May 30,2007. 

IV. CONCLUSIONS 

Scaling invariance holds in a large amount of complex 
systems, but the evaluation of scaling is still a challenge 
task. Theoretically, variance-based methods can not de- 
tect correctly the scalings for Levy processes. Empiri- 
cally, time series are usually not long enough to derive a 
reliable scaling exponent. What is more, in a long time 
series there exist usually structural breaks. In literature, 
diffusion entropy is developed to detect reliable scaling 
exponents for long time series. In the present paper, the 
balanced estimation of entropy for short time series is in- 
troduced to the diffusion entropy to find reliable scalings 
embedded in short time series. 

This method can give reliable scalings even for short 
time series with length ~ 10 2 . It is used to detect the 
scalings embedded in totally 134 stocks in SSE market. 



The scaling exponent for each catalogue is significantly 
larger compared with that for the specific stocks included 
in this catalogue. 

We detect also the local scalings in the SSE index se- 
ries. The scalings varies in a large interval from 0.53 
to 0.85. Three speedy transitions from high(low) to 
low(high) values of 5 occur in the evolutionary curve of 
S, which are used to match the S curve with the closed 
price curve along time. 
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